Regularizing linear discriminant analysis for speech recognition
نویسنده
چکیده
Feature extraction is an essential first step in speech recognition applications. In addition to static features extracted from each frame of speech data, it is beneficial to use dynamic features (called ∆ and ∆∆ coefficients) that use information from neighboring frames. Linear Discriminant Analysis (LDA) followed by a diagonalizing maximum likelihood linear transform (MLLT) applied to spliced static MFCC features yields important performance gains as compared to MFCC+∆+∆∆ features in most tasks. However, since LDA is obtained using statistical averages trained on limited data, it is reasonable to regularize LDA transform computation by using prior information and experience. In this paper, we regularize LDA and heteroschedastic LDA transforms using two methods: (1) Using statistical priors for the transform in a MAP formulation (2) Using structural constraints on the transform. As prior, we use a transform that computes static+∆+∆∆ coefficients. Our structural constraint is in the form of a block structured LDA transform where each block acts on the same cepstral parameters across frames. The second approach suggests using new coefficients for static, first difference and second difference operators as compared to the standard ones to improve performance. We test the new algorithms on two different tasks, namely TIMIT phone recognition and AURORA2 digit sequence recognition in noise. We obtain consistent improvement in our experiments as compared to MFCC features. In addition, we obtain encouraging results in some AURORA2 tests as compared to LDA+MLLT features.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملKonuşma Tanima İçi̇n Heteroskedasti̇k Ayirtaç Anali̇zi̇ni̇n Düzenli̇leşti̇ri̇lmesi̇ Regularizing Heteroschedastic Discriminant Analysis for Speech Recognition
Linear Discriminant Analysis (LDA) followed by a diagonalizing maximum likelihood linear transform (MLLT) applied to spliced static MFCC features yields important performance gains as compared to MFCC+dynamic features in most speech recognition tasks. It is reasonable to regularize LDA transform computation for stability. In this paper, we regularize LDA and heteroschedastic LDA transforms usin...
متن کاملImproved Linear Predictive Coding Method for Speech Recognition
In this paper, improved Linear Predictive Coding (LPC) coefficients of the frame are employed in the feature extraction method. In the proposed speech recognition system, the static LPC coefficients + dynamic LPC coefficients of the frame were employed as a basic feature. The framework of Linear Discriminant Analysis (LDA) is used to derive an efficient and reduced-dimension speech parametric s...
متن کاملStatistical integration of temporal filter banks for robust speech recognition using linear discriminant analysis (LDA)
This paper presents a study on statistical integration of temporal filter banks for robust speech recognition using linear discriminant analysis (LDA). The temporal properties of stationary features were first captured and represented using a bank of well-defined temporal filters. Then these derived temporal features can be integrated and compressed using the LDA technique. Experimental results...
متن کامل